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Abstract 

Replicating or caching popular content in memories distributed across the network is a technique to reduce 
peak network loads. Conventionally, the main performance gain of this caching was thought to result from making 
part of the requested data available closer to end users. Instead, we recently showed that a much more significant 
gain can be achieved by using caches to create multicasting opportunities, even for users with different demands, 
\ through coding across data streams. These simultaneous coded-multicasting opportunities are enabled by careful 

^ ■ content overlap at the various caches in the network, created by a central coordinating server. 

In many scenarios, such a central coordinating server may not be available, raising the question if this 
y— 5 ■ multicasting gain can still be achieved in a more decentralized setting. In this paper, we propose an efficient caching 

t^J- | scheme, in which the content placement is performed in a decentralized manner. In other words, no coordination is 

required for the content placement. Despite this lack of coordination, the proposed scheme is nevertheless able to 
create simultaneous coded-multicasting opportunities, and hence achieves a rate close to the centralized scheme. 

H : 

r/5 ■ I. Introduction 

i-— — j 1 Traffic in content delivery networks exhibits strong temporal variability, resulting in congestion during 
| • peak hours and underutilization during off-peak hours. It is therefore desirable to try to "shift" some of 
>• ! the traffic from peak to off-peak hours. One approach to achieve this is to exploit idle network resources 
[ to duplicate some of the content in memories distributed across the network. This duplication of content 
qq ■ is called content placement or caching. The duplicated content can then be used during peak hours to 
reduce network congestion. 

t-h \ From the above description, it is apparent that the network operates in two different phases: a content 
^ ; placement phase and a content delivery phase. In the placement phase, the network is not congested, and 
■ the system is constrained mainly by the size of the cache memories. In the delivery phase, the network 
. is congested, and the system is constrained mainly by the rate required to serve the content requested 
by the users. The goal is thus to design the placement phase such that the rate in the delivery phase is 
minimized. 

There are two fundamentally different approaches, based on two distinct understandings of the role of 
caching, for how the placement and the delivery phases are performed. 

• Providing Content Locally: In the first, conventional, caching approach, replication is used to make 
part of the requested content available close to the end users. If a user finds part of a requested file in 
a close-by cache memory, that part can be served locally. The central content server only sends the 
remaining file parts using simple orthogonal unicast transmissions. If more than one user requests 
the same file, then the server has the option to multicast a single stream to those users. 
Extensive research has been done on this conventional caching approach, mainly on how to exploit 
differing file popularities to maximize the caching gain [|T] — [O . The gain of this approach is propor- 
tional to the fraction of the popular content that can be stored locally. As a result, this conventional 
caching approach is effective whenever the local cache memory is large enough to store a significant 
fraction of the total popular content. 

• Creating Simultaneous Coded-Multicasting Opportunities: In this approach, which we recently pro- 
posed in flU, content is placed in order to allow the central server to satisfy the requests of several 
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users with different demands with a single multicast stream. The multicast streams are generated by 
coding across the different files requested by the users. Each user exploits the content stored in the 
local cache memory to enable decoding of its requested file from these data streams. Since the content 
placement is performed before the actual user demands are known, it has to be designed carefully 
such that these coded-multicasting opportunities are available simultaneously for all possible requests. 
In (8]|, we show that this simultaneous coded-multicasting gain can significantly reduce network 
congestion. Moreover, for many situations, this approach results in a much larger caching gain than 
the one obtained from the conventional caching approach discussed above. Unlike the conventional 
approach, the simultaneous coded-multicast approach is effective whenever the aggregate global cache 
size (i.e., the cumulative cache available at all users) is large enough compared to the total amount 
of popular content, even though there is no cooperation among the caches. 
As mentioned above, the scheme proposed in [8], relies on a carefully designed placement phase in 
order to create coded-multicasting opportunities among users with different demands. A central server 
arranges the caches such that every subset of the cache memories shares a specific part of the content. It 
is this carefully arranged correlation among the cache memories that guarantees the coded-multicasting 
opportunities simultaneously for all possible user demands. 

While the assumption of a centrally coordinated placement phase was helpful to establish the new 
caching approach in [8], it raises questions about its applicability. For example, the identity or even just 
the number of active users in the delivery phase may not be known several hours in advance during 
the placement phase. As another example, in some cases the placement phase could be performed in one 
network, say a WiFi network, to reduce congestion in the delivery phase in another network, say a cellular 
network. In either case, coordination in the placement phase may not be possible. 

This raises the important question whether lack of coordination in the placement phase eliminates the 
significant rate reduction promised by the simultaneous coded-multicast approach proposed in [8J. Put 
differently, the question is if simultaneous coded-multicasting opportunities can still be created without a 
centrally coordinated placement phase. 

In this paper, we answer this question in the positive by developing a caching algorithm that creates 
simultaneous coded-multicasting opportunities without coordination in the placement phase. More pre- 
cisely, the proposed algorithm is able to operate in the placement phase with an unknown number of 
users situated in isolated networks and acting independently from each other. Thus, the placement phase 
of the proposed algorithm is decentralized. In the delivery phase, some of these users are connected to a 
server through a shared bottleneck link. In this phase, the server is first informed about the set of active 
users, their cache contents, and their requests. The proposed algorithm efficiently exploits the multicasting 
opportunities arranged in the placement phase in order to minimize the rate over the shared bottleneck 
link. We show that our proposed decentralized algorithm can significantly improve upon the conventional 
scheme. Moreover, we show that the performance of the proposed decentralized caching scheme is close 
to the one of the centralized scheme of (8). 

These two claims are illustrated in Fig. [T] for a system with 20 users and 100 pieces of content. For 
example, when each user is able to cache 25 of the files, the peak rate of the conventional scheme 
is equivalent to transmitting 15 files. However, in the proposed scheme, the peak rate is equivalent to 
transmitting only about 3 files. Observe also that the rate penalty for decentralization of the placement 
phase of the caching system is modest. 

The remainder of this paper is organized as follows. Section [XT] formally introduces the problem setting. 
Section Hn] presents the proposed algorithm. Section [IV] explains through various examples how to adjust 
the proposed algorithm to handle various constraints arising in practice. 

II. Problem Setting 

To gain insight into how to optimally operate content-distribution systems, we introduce here a basic 
model for such systems capturing the fundamental challenges, tensions, and tradeoffs in the caching 
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Fig. 1. Performance of different caching schemes for a system with 20 users connected to a server storing 100 files through a shared 
bottleneck link. The horizontal axis is the size of the cache memory (normalized by the file size) at each user; the vertical axis shows the 
peak rate (again normalized by the file size) over the shared link in the delivery phase. The solid black curve depicts the rate achieved 
by the decentralized caching scheme proposed in this paper; the dashed green curve depicts the rate achieved by the conventional caching 
scheme advocated in the prior literature; the dashed blue curve depicts the rate achieved by the centralized caching algorithm from the recent 
paper (§). 



problem. For the sake of clarity, we initially study the problem under some simplifying assumptions, 
which will be relaxed later, as is discussed in detail in Section [IV] 

We consider a content-distribution system consisting of a server connected through a error- freeQ shared 
(bottleneck) link to K users. The server stores iV files each of size F bits. The users each have access to 
a cache able to store MF bits for M E [0, N]. This situation is illustrated in Fig. |2] 



shared link 



users 
caches 



Fig. 2. Caching system considered in this paper. A server containing N files of size F bits each is connected through a shared link to K 
users each with a cache of size MF bits. In the figure, N — K — 3 and M = 1. 

The system operates in two phases: a placement phase and a delivery phase. The placement phase 
occurs when the network load is low. During this time, the shared link can be utilized to fill the caches 
of the users. The main constraint in this phase is the size of the cache memory at each user. The delivery 
phase occurs after the placement phase when the network load is high. At this time, each user requests 

'Any errors in this link have presumably been already taken care of using error correction coding. 
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one file from the server, which proceeds to transmit its response over the shared link. Given the output 
of the shared link (observed by all users) and its cache content, each user should be able to recover the 
requested file. The main constraint in this phase is the congestion of the shared link. The objective is 
to minimize the worst-case (over all possible requests) rate of transmission over the shared link that is 
needed in the delivery phase. 

We now formalize this problem description. In the placement phase, each user is able to fill its cache 
as an arbitrary function (linear, nonlinear, . . . ) of the N files subject only to its memory constraint of 
MF bits. We emphasize that the requests of the users are not known during the placement phase, and 
hence the caching function is not allowed to depend on them. 

In the delivery phase, each of the K users requests one of the iV files and communicates this request 
to the server. Let c4 e {1, . . . , iV} be the request of user k E {1, . . . , K}. The server replies to these 
requests by sending a message over the shared link, which is observed by all the K users. Let R( d i>-' d K) J? 
be the number of bits in the message sent by the server. We impose that each user is able to recover its 
requested file from the content of its cache and the message received over the shared link with probability 
arbitrary close to one for large enough file size F. Denote by 

R= max 

di,...,dx 

the worst-case normalized rate for a caching scheme. 

Our objective is to minimize R in order to minimize the worst-case network congestion RF during the 
delivery phase. Clearly, R is a function of the cache size MF. In order to emphasize this dependence, 
we will usually write the rate as R(M). The function R(M) expresses the memory-rate tradeoff of the 
content-distribution system. 

The following example, illustrates the definitions and notations, and introduces the conventional caching 
scheme advocated in most of the prior literature. This conventional caching scheme will be used as a 
benchmark throughout the paper. 

Example 1 {Conventional Scheme). Consider the caching scenario with N = 2 files and K = 2 users 
each with a cache of size M = 1 (see Fig. |3]). In the conventional scheme, each of the two files A and B 
are split into two parts of equal size, namely A = (Ai,A 2 ) and B = (Bi,B 2 ). In the placement phase, 
both users cache (Ai,Bi), i.e., the first part of each file. Since each of these parts has size F/2, this 
satisfies the memory constraint of MF = F bits. 



Bi,B 2 



shared link 




caches Ai,Bi ii.Bi 



Fig. 3. Conventional caching strategy advocated in the prior literature for N — 2 files and K = 2 users each with a cache of size M = 1. 

Consider now the delivery phase of the system. Assume that each user requests the same file A, i.e., 
d\ — d 2 — 1. The server responds by sending the file part A 2 . Clearly, from their cache content and 
the message received over the shared link, each user can recover the requested file A = (Ai,A 2 ). The 
(normalized) rate in the delivery phase is 

R^ = l/2. 
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Assume instead that user one requests file A and user two requests file B, i.e., d\ — 1 and d 2 = 2. The 
server needs to transmit (A 2 , B 2 ) to satisfy these requests, resulting in a rate in the delivery phase of 

= 1. 

It is easy to see that this is the worst-case request, and hence 

R= 1 

for this scheme. 

We refer to the strategy described above as the conventional caching scheme, since it is the strategy 
used by traditional caching systems. For general N, K, and M, this scheme caches the first M/N fraction 
of each of the N files. Therefore, in the delivery phase, the server has to send the remaining 1 — M/N 
fraction of the requested files. The resulting rate in the delivery phase, denoted by Rq(M) for future 
reference, is 

R C {M) = K-(l- M/N) ■ min{l, N/K). 
For N = K = 2 and M = 1, this yields 

Rc(i) = l, 

as before. 

As we will see, this conventional caching scheme can be significantly improved upon. In particular, see 
Example [2] in Section IITTa! 

One important feature of the conventional scheme introduced in Example \T\ is that it has a decentralized 
placement phase. By that, we mean that the cache of each users is filled independently of other users. 
In particular, the placement operation of a given user neither depends on the identity nor the number of 
other users in the system. As a result, the users could, in fact, contact different servers at different times 
for the placement phase. Having a decentralized placement phase is thus an important robustness property 
for a caching system. This is discussed in detail in Section IIV-BI 

As was mentioned earlier, the system description introduced in this section makes certain simplifying 
assumptions. In particular, we assume a system having a single shared broadcast link, with a cache at 
each user of known content in the delivery phase, and we focus on worst-case, synchronized user requests 
in the delivery phase for a single file of equal size. All these assumptions can be relaxed, as is discussed 
in Section HV^Al 

III. Main Results 

We start in Section IIII-AI by presenting a new caching algorithm with a decentralized placement 
phase. Section IIII-BI compares the proposed algorithm to the conventional caching scheme introduced 
in Example Q] in Section HH which is advocated in most of the literature and which is the best previously 
known algorithm with a decentralized placement phase. Finally, Section IIII-CI compares the proposed 
algorithm to the best known centralized caching scheme. 

A. A Decentralized Caching Algorithm 

We now present an algorithm (referred to as Algorithm [T]in the following) for the caching problem. In 
the statement of the algorithm, we use the notation V^s to denote the bits of the file d k requested by user 
k cached exclusively at users in S. In other words, a bit of file dk is in V^s if it is present in the cache 
of every user in S and if it is absent from the cache of every user outside S. We also use the notation 
[K]±{l,2,...,K}znd[N}±{l,2,...,N}. 

Algorithm \T\ consists of a placement procedure and two delivery procedures. In the placement phase, we 
always use the same placement procedure. In the delivery phase, the server chooses the delivery procedure 
minimizing the resulting rate over the shared link. 
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Algorithm 1 
i: procedure Placement 
2: for k e [K],n e [N] do 

3: User k caches a random subset of ^jf- bits of file n 

4: end for 
5: end procedure 

6: procedure Delivery^ , . . . , d K ) 

7: for s = K, K — 1, . . . , 1 do 

8: for S c [K\ : |<S| = s do 

9: Server sends ® k esV k ,s\{k} 

10: end for 

ii: end for 
12: end procedure 

13: procedure Delivery' (di , d K ) 
14: for n e [N] do 

15: Server sends enough random linear combinations of bits in file n for all users requesting it to 

decode 
16: end for 
17: end procedure 



Remark 1; For the first delivery procedure, the © operation in Line [9] of Algorithm Q] represents the 
bit-wise XOR operation. All elements Vk,s\{k} are assumed to be zero padded to the length of the longest 
element. For the second delivery procedure, an elementary analysis shows that the number of linear 
combinations that need to be sent in Line Q3] is on the order of F(l — M/N) min{iV, K] for large file 
size F. 

We illustrates the algorithm with an example. 

Example 2 (Illustration of Algorithm^. Consider the caching problem with N = 2 files A and B, and 
K = 2 users each with a cache of size M. In the placement phase of Algorithm \T\ each user caches a 
subset of MF/2 bits of each file independently at random. As a result, each bit of a file is cached by a 
specific user with probability M/2. Let us focus on file A. The prefetching algorithm splits file A into 4 
subfiles, 

A= (A 9 ,A 1 ,A 2 ,A 1>2 ), 

where, for S C {1,2}, A s denotes the bits of file A that are stored in the cache memories of users in 
For example, A 12 are the bits of A available in the cache memory of users one and two, whereas Ai 
are the bits of A available exclusively in the cache memory of user one. 
By the law of large numbers, 

\A S \ w (M/2) |5| (l - M/2) 2 - |5| F 

with probability approaching one for large enough file size F. Therefore, we have with high probability, 

. \A$\/F is approximately (1 - M/2) 2 , 

. \Ai\/F and \A 2 \/F are approximately (M/2)(l - M/2), 

. \A lj2 \/F is approximately (M/2) 2 , 
The same analysis holds for file B. 

We now consider the delivery phase in Algorithm [T] As we will see later (see Remark |2] below), for 
the scenario at hand the first delivery procedure will be used. Assume that users one and two request files 
A and B, respectively. 



2 To avoid heavy notation, we write Ai^ as shorthand for A{ lj2 }- Similarly, we write Vi,2 for V\^2}- 
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The iteration in Line [7J of Algorithm Q] starts with s = 2. By Line[8l this implies that we consider the 
set S = {1, 2}. Observe that 

• The cache of user two contains A 2 , which is needed by user one. Hence, Vi i2 = A 2 . 

• The cache of users one contains v3 l5 which is needed by user two. Hence, V21 = B\. 

As a result, in Line [9] of Algorithm [1] the server transmits A 2 @Bi over the shared link. User one can solve 
for A 2 from the received message A 2 © B\ and the cached subfile B\. User two can solve for B\ from 
the message A 2 © B\ and the cached subfile A 2 . Therefore, A 2 © B\ is simultaneously useful for s = 2 
users. Thus, even though the two users request different files, the server can successfully multicast useful 
information to both of them. We note that the normalized (by F) size of A 2 © B\ is (M/2)(l — M/2). 

The second iteration in Line [7J is for s = 1. In this iteration, the server simply sends V\$ = A® and 
^2,0 = B<jj in Line [8] Each of these transmissions has normalized size (1 — M/2) 2 . 

From A 2 computed in iteration one, A® received in iteration two, and its cache content (A±, Ai )2 ), user 
one can recover the requested file A = (A$, A 1 , A 2 , A 1 2 ). Similarly, user two can recover the requested 
file B. 

Summing up the contributions for s = 2 and s — 1, the aggregate size (normalized by F) of the 
messages sent by the server is 

(M/2)(l - M/2) + 2(1 - M/2) 2 , 

which can be rewritten as 

2-(l-M/2)--(l-(l-M/2) 2 ). 

In particular, for M = 1, the rate of Algorithm [T] is 3/4. This compares to a rate of -Rc(l) = 1 achieved 
by the conventional scheme described in Example \T\ in Section [TH While the improvement in this scenario 
is relatively small, as we will see shortly, for larger values of iV and K, this improvement over the 
conventional scheme can be large. 

We point out that the placement part of Algorithm [T] is decentralized, in the sense that the caches of 
each user are filled independently of other users. This decentralization of the placement phase enables the 
content-distribution system to be much more flexible than a centralized placement phase. In particular, it 
implies that knowledge of the identity or the number of users sharing the same bottleneck link during the 
delivery phase need not be known during the earlier placement phase. This and other system aspects of 
Algorithm \T\ are elaborated on in Section IIV-BI 

The performance of Algorithm [1] is analyzed in the next theorem, whose proof can be found in 
Appendix lAl 

Theorem 1. Consider the caching problem with N files each of size F bits and with K users each having 
access to a cache of size MF bits with M e (0, N]. Algorithm [7] is correct and, for F large enough, 
achieves rate arbitrarily close to 

R X {M) ±K-{1- M/N) ■ min{^(l - (1 - M/N) K ), 

Remark 2: We note that if N > K or M > 1, then the minimum in R\(M) is achieved by the first 
term so that 

Ri(M) = K ■ (1 - M/N) _(!-(!- M/N) K ). 

This is the rate of the first delivery procedure in Algorithm [1] Since N > K or M > 1 is the regime of 
most interest, the majority of the discussion in the following focuses on this case. 

Remark 3: Theorem \T\ is only stated for M > 0. For M = Algorithm \T\ is easily seen to achieve a 
rate of 

i?i(0) = min{N, K}. 
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We see that Ri(0) is the continuous extension of Ri(M) for M > 0. To simplify the exposition, we will 
not treat the case M = separately in the following. 

We point out that the rate R\(M) of Algorithm Q] consists of three distinct factors. The first factor is 
K\ this is the rate without caching. The second factor is (1 — M/N); this is a local caching gain that 
results from having part of the requested file already available in the local cache. The third factor is a 
global gain that arises from using the caches to create simultaneous coded-multicasting opportunities. See 
Example [2] for an illustration of the operational meaning of these three factors. 



B. Discussion 

It is instructive to examine the performance of Algorithm [TJ for large and small values of cache size 
M. For simplicity, we focus here on the most relevant case N > K, i.e., the number of files is at least 
as large as the number of users, so that the rate R\ (M) of Algorithm Q] is given by Remark |2] 

As a baseline, we compare the result with the conventional scheme, introduced in Example \T\ in 
Section lU This conventional caching scheme is the best previously known algorithm with a decentralized 
placement phase. For N > K, the conventional scheme achieves the rate 

Rc(M)=K-(l-M/N), (1) 

which is linear with slope of —K/N throughout the entire range of M. 

1) Small M: For small cache size M G [0,N/K], the rate achieved by Algorithm \T\ behaves approxi- 
mately^ as 

Ri (M)mK.(1-—). (2) 

In this regime, R\ (M) scales approximately linearly with the memory size: increasing M by one decreases 
the per-user rate by approximately K/(2N). This is illustrated in Fig. |4] 
Comparing CD and ©, we have the following observations: 

• Order-K Improvement in Slope: The slope of R\(M) around M = is approximately K/2 times 
steeper than the slope of Rc(M). Thus, the reduction in rate as a function of M is approximately 
K/2 times faster for Algorithm [TJthan for the conventional scheme. In other words, for small M 
the scheme proposed here makes approximately K/2 times better use of the cache resources: an 
improvement on the order of the number of users in the system. This behavior is clearly visible in 
Fig. Q] in Section HI 

• Virtual Shared Cache: Consider a hypothetical setting in which the K cache memories are collocated 
and shared among all K users. In this hypothetical system, arising from allowing complete cooperation 
among the K users, it is easy to see that the optimal rate is K • (1 — KM/N). Comparing this to ©, 
we see that, up to a factor 2, the proposed Algorithm Q] achieves the same order behavior. Therefore, 
the proposed algorithm is essentially able to create a single virtually shared cache, even though the 
caches are isolated without any cooperation between them. 

2) Large M: On the other hand, for M G [N/K, N] we can approximate^ 

Rl (M)*K-(l-M/N)~. (3) 

In this regime, the rate achieved by Algorithm \T\ scales approximately inversely with the memory size: 
doubling M approximately halves the rate. This is again illustrated in Fig. |4] 
Comparing dU) and ©, we have the following observation: 

3 More precisely, Ri(M) — K — K ^ 0(M 2 ), and by analyzing the constant in the 0(M 2 ) expression it can be shown that this 
is a good approximation in the regime M £ [0, N/K]. 

4 Indeed, since (1-M/N) K < (1-1/K) K < 1/e, we have Ri(M) = 9(JV/M-1) in the regime M £ [N/K, N], and the pre-constant 
in the order notation converges to 1 as M —5- N. 
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• Order-K Improvement in Rate: In this regime, the rate of the proposed scheme can be up to a factor 
K smaller than the conventional scheme: again an improvement on the order of the number of users 
in the system. This behavior is clearly visible in Fig. [T]in Section [0 



R 




Fig. 4. Memory-rate tradeoff Ri(M) achieved by Algorithm Q] for TV = 100 files and K = 5 users (see Theorem [TJ. The function 
Ri{M) behaves approximately linearly for M G [0, N/K] and behaves approximately as K • (1 - M/N) ■ ^ for M £ [N/K, N] (both 
approximations are indicated by dotted curves). 



C. Comparison with Centralized Scheme 

We have compared the performance of the decentralized Algorithm \T\ to the conventional caching 
scheme, which is the best previously known decentralized algorithm for this setting. We now compare the 
decentralized Algorithm [T] to the rate achievable by centralized caching schemes. We start by comparing 
the rate of Algorithm \T\ to an information-theoretic lower bound on the rate of any centralized caching 
scheme. We then compare the rate of Algorithm [T] to the rate of the best known centralized caching 
scheme recently introduced in flU. 

Theorem 2. Let Ri(M) be the rate of the decentralized caching scheme given in Algorithm^ and let 
R*(M) be the rate of the optimal centralized caching scheme. For any number of files N and number of 
users K and for any M e [0, N], we have 

R*(M) ~ 

The proof of Theorem |2l presented in Appendix |B] uses an information-theoretic argument to lower 
bound the rate of the optimal scheme R*(M). As a result, Theorem |2] implies that no scheme (centralized, 
decentralized, with linear caching, nonlinear caching, . . . ) regardless of is computational complexity 
can improve by more than a constant factor upon the efficient decentralized caching scheme given by 
Algorithm \T\ presented in this paper. 

We now compare the rate i?i(M) of the decentralized Algorithm [1] to the rate R 2 (M) of the best known 
centralized caching scheme. By [8, Theorem 2], i?2(M) is given by 

R 2{ M) = K.(l-M/N).^{ TT l m ,^} 

for M e j^{0,1, . . . , K}, and the lower convex envelope of these points for all remaining values of 
M E [0, N]. Fig. [T] in Section U compares the performance R^{M) of this centralized scheme to the 



10 



performance Ri(M) of the decentralized caching Algorithm Q] proposed here. As can be seen from the 
figure, the centralized and decentralized caching algorithms are very close in performance. Thus, there is 
only a small price to be paid for decentralization. Indeed, we have the following corollary to Theorem |2] 

Corollary 3. Let Ri (M) be the rate of the decentralized caching scheme given in Algorithm [7] and let 
R 2 (M) be the rate of the centralized caching scheme from [8]. For any number of files N and number 
of users K and for any M e [0, N], we have 

R 2 (M) S VZ - 

Corollary |3] shows that the rate achieved by the decentralized caching scheme given by Algorithm Q] is 
at most a factor 12 worse than the one of the best known centralized algorithm from [8]. This bound can 
be tightened numerically to 

*M<1.6 
R 2 (M) - 

for all values of K, N, and M. Hence, the rate of the decentralized caching scheme proposed here is 
indeed quite close to the rate of the best known centralized caching scheme. 



IV. Implementation Aspects and Example Scenarios 

This section discusses how some of the implementation issues arising in practical systems can be dealt 
with. Section IIV-AI addresses limitations of the system model, Section IIV-BI discusses several example 
systems. 



A. Relaxing the Model Assumptions 

The caching problem as analyzed so far makes several simplifying assumptions to ensure its analytical 
tractability. We now highlight some of these assumptions and discuss how they can be dealt with. 

• Single Shared Link: We consider a situation consisting only of a single shared link. This link models 
the bottleneck in the system. In a wireline system, this link might be the one closest to the server, 
which would see the largest demand. Alternatively, this situation could arise in a wireless setting, 
where the shared link models the broadcast nature of the physical medium. In this case, the server 
is to be thought of as being placed at (or close to) the cellular base station or wireless access point. 

• Broadcast: The communication between the server and the users is modeled as taking place over a 
broadcast channel. In the wireless setting, this channel occurs naturally due to the broadcast nature 
of the physical channel. In a wireline setting the broadcast channel is to be interpreted as a logical 
abstraction and would be implemented through multicasting over the larger network (say the Internet). 
The corresponding multicast tree could be either established directly in the IP layer or alternatively 
in the application layer by setting up dedicated servers along the path from the central server to the 
users. 

• Worst-Case Demands: The problem formulation focuses on worst-case requests. In some situation, 
this is the correct figure of merit. For example, in a wireless scenario, whenever the delivery rate 
required for a request exceeds the available link bandwidth, the system will be in outage, degrading 
user experience. In other situations, for example a wireline scenario, excess rates might only incur a 
small additional cost and hence might be acceptable. In this situation, the N files are to be understood 
as the most popular ones. Whenever a user requests a file outside the first N (which should happen 
with small enough probability), this request could be served through a separate unicast transmission. 

• Rate over Private Links: The figure of merit R(M) in our problem formulation is solely concerned 
with the rate over the shared link. The rates over the private links (i.e., the final piece of the link 
going to the user) are ignored. This is the appropriate approach in a wireless scenario, since the 
private links are then absent. However, in a wireline scenario, the rates over the private links could 
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easily become the bottleneck themselves. As it turns out, for the caching algorithm proposed in this 
paper, this is not the case. This is discussed in more detail in Example [3] in Section IIV-B1 

• Synchronized Users: The delivery phase is modeled as being synchronous, as all requests are assumed 
to arrive at the server at the same time. In a real system, requests will instead arrive asynchronously. 
Synchronization could be achieved by having the server wait until the requests from all users 
are registered before transmitting over the shared link. However, this leads to intolerable delays, 
particularly for streaming content such as on-demand movies. This problem can be mitigated by 
dividing the content (say the movie) into smaller segments (say 10s of video). This approach is 
elaborated on in Example |6] in Section IIV-BI 

• Known Number of Users: The problem setting assumes that the number of users K is constant. In 
particular, K is assumed to be the same in the placement phase as in the delivery phase. Since it is 
hard to know how many users will be active several hours in the future, this is clearly not a very 
realistic assumption. However, as we will see later, one of the salient features of the decentralized 
caching algorithm presented in this paper is that advance knowledge of the number of users in the 
delivery phase is not required. See Example |4] in Section IIV-BI for more details. 

• One Request per User: Each user is assumed to request exactly one file during the delivery phase. In 
a real system, users might request more than one or, indeed, none of the files in the database. This 
situation can be dealt with by handling each request as a separate user. This will result in a system 
with a highly variable number of "virtual" users. However, as we have discussed in the last item, 
this variability in the number of users can be dealt with. 

• Equal File Size: All files in the problem setting have the same size of F bits. In practice, files would 
have different file sizes. This situation can be handled by splitting each file into smaller segments 
of constant size. This, in turn, leads to the situation in which some users request more than just one 
segment, which was addressed in the previous item. 

• Known Cache Content: During the delivery phase, the server is assumed to know the cache content 
of all K users. This assumption is critical, as otherwise the server is unable to send "complementary" 
information to the users. There are two ways to guarantee this property. First, the users can explicitly 
inform the server of the content of their caches. If the content is distributed in large enough blocks, 
then the overhead resulting from this is negligible. Second, the server and users can agree beforehand 
on a particular caching strategy, parametrized perhaps by some unique ID of the user (say the IP 
address). Once the server knows this ID, it can compute the users cache content without further 
communication. 

• Cache at Users: In the scenario considered here, each user has access to its own private cache. Such 
a situation can arise in a real system (a set-top box might have an integrated hard drive that could 
be used for caching, for example). However, caches can also be located close to, but not directly at, 
the user. The results and techniques developed here apply to those scenarios as well, as is discussed 
in Example |5] in Section IIV-BI 

B. Example Scenarios 

We now discuss application of the proposed caching scheme given in Algorithm \T\ to several example 
scenarios. 

Example 3 (Rate over Private Links). As we have seen, the proposed caching scheme can significantly 
reduce the traffic rate over the shared (bottleneck) link. This may raise the question if this rate reduction 
is achieved at the expense of a rate increase over the dedicated links branching off the shared link to the 
end users. The answer to this question is negative. In fact, the normalized rate over each branch link is 
exactly the same as resulting from the conventional caching scheme introduced in Example \T\ in Section HO 
Recall that, in the conventional scheme, each user caches MF/N bits of each file and receives the 
remaining (1 — M/N)F bits of the requested file from the server. Thus, the normalized rate over each 
branch link resulting from the conventional scheme is 1 — M/N . 
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Perhaps surprisingly, in the proposed scheme, the rate at each branch is also 1 — M/N. To see this, 
let us focus on the scenario with N = 2 files and K = 2 users as described in Example |2] in Section |ni] 
Observe that a user does not need all messages sent by the server over the shared link in order to recover 
its requested file. For example, in order to recover file A, user one only needs A 2 © B\ and A%. Thus, 
instead of broadcasting to all users, the server can set up several multicast trees, one for each linear 
combination to be sent. The members of each such multicast tree only include those users that actually 
require the linear combination in question. 

The resulting normalized rate over the branch link to user one is then 

(M/2)(l - M/2) + (1 - M/2) 2 = 1 - M/N. 

This is the same as in the conventional caching scheme. Using the same argument, it can easily be shown 
that the normalized rate over each branch link resulting from the proposed scheme is the same as the 
one resulting from the conventional caching scheme for arbitrary numbers of users and files. Thus, the 
reduction in rate over the shared link in the proposed scheme does not result in an increase in rate over 
the private branch links. 

Example 4 (Unknown Number of Users). Our analysis so far has assumed that the number K of users 
sharing the bottleneck link in the delivery phase is already known in the placement phase. In most 
situations arising in practice, this would likely not be the case. For example, in wireline networks, some 
of the users may not request any file in the delivery phase. In wireless networks, users may move from 
one network or cell to another. In either case, the result is that the precise number and identity of users 
in the delivery phase is unknown in the placement phase. Moreover, users might not even be connected 
to the same server in the two phases. 

One of the salient features of the decentralized algorithm proposed in this paper is that it can easily 
deal with these situations. Indeed, it is easy to see that the placement procedure of Algorithm [T] is 
completely independent of the identity or even the number of users sharing the bottleneck link in the 
delivery phase. Indeed, each user randomly selects a M/N fraction of each file, completely independently 
of other users. Furthermore, the prefetching operation can be performed by separate servers without any 
need for coordination between them. Therefore, regardless of the number of users in the delivery phase, 
and regardless of the network they are in during the placement phase, the rate in the delivery phase is 
within a constant factor of the optimal rate with known number and identity of users. 

We point out that this robustness of the proposed caching algorithm is due to its decentralized nature. 
The centralized caching algorithm proposed in [8| does not have this property, and hence requires the 
number and identity of the users in the delivery phase to be known in the placement phase. 

Example 5 (Shared Caches). The problem setting considered throughout this paper assumes that each 
user has access to a private cache. In this example, we evaluate the gain of shared caches. This situation 
arises when the cache memory is located close to but not directly at the users. 

We consider a system with K users partitioned into subsets, where users within the same subset share 
a common cache. For simplicity, we assume that these subsets have equal size of L users, where L is 
a positive integer dividing K. We also assume that the number of files iV is greater than the number of 
caches K/L. To keep the total amount of cache memory in the system constant, we assume that each of 
the shared caches has size LMF bits. 

We can operate this system as follows. Define K/L super users, one for each subset of L users. Run 
the placement procedure of Algorithm [T] for these K/L users with cache size LMF. In the delivery 
phase, treat the (up to) L files requested by the users in the same subset as a single super file of size LF. 
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Applying Theorem Q] to this setting yields an achievable rate o£] 

R l>L (M) = K-(1- LM/N) . JL(1 - (l - LM/N) K ' L ). 

Let us again consider the regimes of small and large values of M of R\ t i,(M). For M e [0, N/K], we 
have 

J2 lfi (M) « JT • (l - — M) . 

Comparing this to the small-M approximation (O of R\(M) (for a system with private caches), we see 
that 

R lfL {M)^R x {M), 

i.e., there is essentially no effect on the achievable rate from sharing a cache. This should not come as 
a surprise, since we have already seen in Section IIII-BI that for small M, R\(M) behaves almost like a 
system in which all K caches are combined, and hence there is essentially no sizable gain to be achieved 
for having collaboration among caches. 

Consider then the regime M > N/ K. Here, we have 

R ljL (M)*K-(l-LM/N)~ 

and from © 

N 

Ri{M) « K ■ (1 - M/N) ' 



KM 

The difference between the two approximations is only in the second factor. We recall that this second 
factor represents the caching gain due to making part of the files available locally. Quite naturally, this 
part of the caching gain improves through cache sharing, as a larger fraction of each file can be stored 
locally. 

Example 6 {Asynchronous User Requests). Up to this point, we have assumed that in the delivery phase 
all users reveal their requests simultaneously, i.e., that the users are perfectly synchronized. In practice, 
however, users would reveal their requests at different times. In this example, we show that the proposed 
algorithm can be modified to handle such asynchronous user requests. 

We explain the main idea with an example. Consider a system with N = 3 files A, B,C, and K = 3 
users. We split each file into J consecutive segments, e.g., A = (A^\ . . . , A^) and similarly for B and 
C. Here J is a positive integer selected depending on the maximum tolerated delay, as will be explained 
later. To be specific, we choose J = 4 in this example. 

In the placement phase, we simply treat each segment as a file. We apply the placement procedure of 
Algorithm \T\ For the delivery phase, consider an initial request d\ from user one, say for file A. The server 
responds by starting delivery of the first segment A^ of file A. Meanwhile, assume that user two requests 
file d,2, say B, as shown in Fig. [51 The server puts the request of user two on hold, and completes the 
delivery of A^ for user one. It then starts to deliver the second segment A^ of A and the first segment 
B^ of B using the delivery procedure of Algorithm fl] for two users. Delivery of the next segments A^ 
and B^ is handled similarly. Assume that at this point user three requests file d 3 , say C, as shown in 
Fig. \5\ The server reacts to this request after finishing the current delivery phase by delivering A^\ B^\ 
and to users one, two, and three respectively, using the delivery procedure of Algorithm [T] for three 
users. The process continues in the same manner as depicted in Fig. [51 

We note that users two and three experience delay of A 2 and A 3 , respectively, as shown in the figure. 
The maximum delay depends on the size of the segments. Therefore, segment size, or equivalently the 
value of J, can be adjusted to ensure that this delay is tolerable. 

5 This can be derived from Ri(M) in Theorem[T]by replacing M by LM (since each cache has now size LMF instead of MF), replacing 
K by K/L (since there are K/L super users), and multiplying the result by an extra factor of L (since each super file is L times the size 
of a normal file). 
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Fig. 5. The proposed scheme over segmented files can be used to handle asynchronous user requests. In this example, each file is split 
into four segments. Users two and three are served with a delay of A2 and A3, respectively. 



We point out that the number of effective users in the system varies throughout the delivery phase. 
Due to its decentralized nature, the proposed caching algorithm is close to optimal for any value of 
users as discussed in Example 0] This is instrumental for the segmentation approach just discussed to be 
efficient. 

Example 7 (Caching Random Linear Combinations is Inefficient). Prefetching random linear combinations 
of file segments is a popular scheme for prefetching, and is advocated in some papers [9|. In this example, 
we argue that in some scenarios this form of prefetching can be quite inefficient. 

To be precise, let us focus on a specific scenario with K users and N = K files, where each user 
has sufficient cache memory to store half of the files, i.e. M = N/2 = K/2. According to Theorem [T] 
Algorithm \T\ achieves a rate of less than one, i.e., Ri(M) < 1. 

On the other hand, the rate achieved by caching of random linear combinations can be shown to be at 
most K/A, which is significantly larger than Ri(M) for large number of users K. Indeed, assume that 
user one requests file A. Recall that each user has cached F/2 random linear combinations of file A. With 
high probability, these random linear combinations span a F/2-dimensional space at each user and the 
subspaces of different users do not overlap. For example, consider users two and three. As a consequence 
of this lack of overlap, these two users do not have access to a shared part of the file A. This implies 
that, in the delivery phase, the server cannot form a linear combination that is simultaneously useful for 
three users. In other words, the server can form messages that are at most useful simultaneously for up to 
two users. A short calculation reveals that then the server has to send at least FK/A bits over the shared 
link. 

Intuitively, this inefficiency of caching random linear combinations can be interpreted as follows. The 
placement phase follows two contradicting objectives: The first objective is to spread the available content 
as much as possible over the different caches. The second objective, is to ensure maximum overlap among 
different caches. The system performance is optimized if the right balance between these two objectives is 
struck. Caching random linear combinations maximizes the spreading of content over the available caches, 
but provides minimal overlap among them. At the other extreme, the conventional scheme maximizes the 
overlap, but provides only minimal spreading of the content. 

Appendix A 
Proof of Theorem Q] 

Note that since there are a total of N files, the operations in Line |3] of Algorithm [T] satisfies the memory 
constraint of MF bits at each user. Hence the placement phase of Algorithm \T\ is correct. 

For the delivery phase, assume the server uses the first delivery procedure, and consider a bit in the 
file requested by user k. If this bit is already cached at user k, it does not need to be sent by the server. 
Assume then that it is cached at some (possibly empty) set T of users with k ^ T. Consider the set 
S = T U {k} in Line [8] By definition, the bit under consideration is contained in Vk,s\{k}, and as a 
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consequence, it is included in the sum sent by the server in Line |9j Since k E S \ {k} for every other 
k E S, user k has access to all bits in s\{k\ fr° m its own cache. Hence, user k is able to recover the 
requested bit from this sum. This shows that the first delivery procedure is correct. 

The second delivery procedure is correct as well since the server sends in Line [15] enough linear 
combinations of every file for all users to successfully decode. This shows that the delivery phase of 
Algorithm \T\ is correct. 

It remains to compute the rate. We start with the analysis of the second delivery procedure. If N < K, 
then in the worst case there is at least one user requesting every file. Consider then all users requesting file 
n. Recall that each user requesting this file already has FM/N of its bits cached locally by the operation 
of the placement phase. An elementary analysis reveals that with high probability for F large enough at 
most 

F(l - M/N) + o(F) 

random linear combinations need to be sent in Line [9] for all those users to be able to decode. We will 
assume that the file size F is large and ignore the o(F) term in the following. Since this needs to be done 
for all N files, the normalized rate in the delivery phase is 

(1 - M/N)N. 

If N > K, then there are at most K different files that are requested. The same analysis yields a 
normalized rate of 

(1 - M/N)K. 

Thus, the second procedure has a normalized rate of 

R(M) = (1 - M/N) min{fs:, N} 

= K(l - M/N) min{l, N/K} (4) 

for M E (0,N\. 

We continue with the analysis of the first delivery procedure. Consider a particular bit in one of the 
files, say file n. Since the choice of subsets is uniform, by symmetry this bit has probability 

q 4 M/N E (0, 1] 

of being in the cache of any fixed user. Consider now a fixed subset of t out of the K users. The probability 
that this bit is cached at exactly those t users is 

Hence the expected number of bits of file n that are cached at exactly those t users is 

Fq\l - q) K -\ 
In particular, the expected size of V^s\{k\ with = s is 

Fq s ' l {l - q) K ~ s+l . 

Moreover, for F large enough the actual realization of the random number of bits in Vk,s\{k} is in the 
interval 

F<f-\\ -q) K '- s+1 ±o(F). 

For ease of exposition, we will again ignore the o(F) term in the following. 

Consider a fixed value of s in Line [7] and a fixed subset S of cardinality s in Line [8] In Line |9l the 
server sends 

max|V fc)<SX{fc} | = F<f-\l - q) K ~ s+1 
kes 
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bits. Since there are f) subsets S of cardinality s, the loop starting in Line [8] generates 



s-l/i A-RT-s+l 



Fg- A (l- g ) 



\K-s+l 



bits. Summing over all values of s yields a total of 

- fK\ 

R(M)F = Fj2{ s )q s - 1 (i-<iy 

= F I_I(1 _(!_,)*) 

bits being sent over the shared link. Substituting the definition of q = M/N yields a rate of the first 
delivery procedure of 

R(M) = (N/M - 1) (1 - (1 - M/N) K ) 

= K ■ (1 - M/N) . JL(l - (1 _ M/AO*) (5) 

for M G (0,iV]. 

Since the server uses the better of the two delivery procedures, © and © show that Algorithm Q] 
achieves a rate of 

J2(M) = K ■ (1 - M/iV) • min{ JJL (l - (1 - M/N) K ) , 1, ^ } . 

Using that 

(1 - M/N) K > 1 - KM/N, 

this can be simplified to 

i?(M) = if. (1 - M/N) ■ min{-^(l - (1 - M/N)*),^}, 
concluding the proof. 

Appendix B 
Proof of Theorem [2] 

Recall from Theorem \T\ that 

R,(M) = K(l - M/N) min{^(l - (1 - M/N)*),^}. 

Using that 

(1 - M/N) K > 1 - KM/N 

and 

(1 - M/N) K > 0, 

this can be upper bounded as 

Ri{M) < mm{N/M - 1, K(l - M/N), N(l - M/N)} (6) 
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for all M e (0, N]. Moreover, we have from [[8„ Theorem 2] 

R*{M)> max (s - — ^— i\A (7) 

s6{l,...,min{AT,Jf }} \ [N/s] ) 

We will treat the cases < min{iV, K} < 12 and min{iV, K} > 13 separately. Assume first that 
< min{iV, K} < 12. By ©, 

Ri{M) < mm{N,K}(l - M/N) < 12(1 - M/N), 

and by © with s = 1, 

R*(M) > 1 - M/N. 

Hence 

^ < 12 (8) 
R*(M) ~ v 

for < min{N, K} < 12. 

Assume in the following that min{iV, K} > 13. We consider the cases 

( [0,max{l,N/K}], 
Me < (max{l, N/K}, N/12], 
[In/12, N], 

separately. Assume first that < M < max{l, N/K}. By ©, 

R X {M) < mm{N,K}(l - M/N) < mm{N,K}. 
On the other hand, by © with s = [mm{N, K}/4\, 

s 2 M 



R*(M) > s 



1 — s/N N 

(min{N,K}) 2 /16 M 



> mm{N, K}/4 - 1 



min{ N, K} 1/4 - 1/ min{iV, K} 



1 — min{ A^, K} / (4iV) JV 

min{A^,is:}/16 M 



l-min{N,K}/(4N) N 



> miniAT, AT} ( 1/4 - 1/13 / \ J ) 

1 ' J V 7 7 16 l-min{Ar ir}/(4Ar) / 



> min{A^, K} (l/4 - 1 /13 - - 



1/16 



1/4, 

> min{N,K}/12, 

where in (a) we have used that min{A^, K} > 13 and M < max{l, N/K}. Hence 

Rx{M) 



R*(M) 

for min N, K > 13 and < M < max{l, N/K}. 

Assume then that max{l, N/K} < M < N/12. By ©, 

Ri(M) < N/M - 1 < N/M. 



< 12 (9) 
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On the other hand, by © with s = |JV/(4Af)J, 

s 2 M 



R*(M) > s 



1 - s/N N 



„ . N 2 16M 2 M 

> N (AM) - 1 e— 

~ 7 v ; 1 - I /{AM) N 

N f n ^. AT 1/16 \ 



M 



1/4 — M/N 



- I /(AM)) 



« TV / 1/16 
> — (1/4-1/12- - 7 
"Ml' 7 1 



1/4, 
= N/(12M), 

where in (a) we have used that M/iV < 1/12 and that M > max{l, N/K} > 1. Hence 

RAM) 

TT777T < 12 (10) 
R*(M) ~ 

for min{iV, K} > 13 and max{l, N/K} < M < N/12. 
Finally, assume that N/12 < M < N. By ©, 

i?i(M) < N/M-l. 

On the other hand, by © with s = 1, 

R*(M) > 1 - M/N. 



Hence, 



Ri(M) N/M-l 
R*(M) ~ 1 - M/N 
= N/M 

< 12 (11) 



for min{iV, K} > 13 and N/12 < M < N. 
Combining ®, ©, (fTOb . and CCD yields that 

Ri(M) 

R*(M) 

for all N, K, and < M < N. 



< 12 
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